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Abstract 

We discuss here the mean-field theory for a cellular automata model of meta-learning. The meta- 
learning is the process of combining outcomes of individual learning procedures in order to 
determine the final decision with higher accuracy than any single learning method. Our method is 
constructed from an ensemble of interacting, learning agents, that acquire and process incoming 
information using various types, or different versions of machine learning algorithms. The 
abstract learning space, where all agents are located, is constructed here using a fully connected 
model that couples all agents with random strength values. The cellular automata network 
simulates the higher level integration of information acquired from the independent learning 
trials. The final classification of incoming input data is therefore defined as the stationary state of 
the meta-learning system using simple majority rule, yet the minority clusters that share opposite 
classification outcome can be observed in the system. Therefore, the probability of selecting 
proper class for a given input data, can be estimated even without the prior knowledge of its 
affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are 
build from simple binary classification machine learning algorithms by calculating the percentage 
of agreeing agents. 
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Introduction 

The Machine Learning (ML) algorithms allow computers to learn based on training data. A major 
focus of ML is to recognize complex patterns in datasets, or make intelligent decisions based on 
data. Typically ML algorithms are divided into several classes: 1) supervised learning (generated 
a function that maps input data into desired outputs); 2) unsupervised learning (model a set of 
inputs, where no prior classification is given); 3) semi-supervised learning (generate an 
appropriate function or classifier); 4) reinforcement learning (learn how to act given an 
observation of the world, where every action has some impart in the environment, with feedback 
of it back to the algorithm); 5) transduction (predicts new outputs based on training inputs, 
outputs and test inputs); and 6) learning to learn (learns its own inductive bias based on previous 
experience) [1]. Different algorithms of ML have been applied successfully to solve real-life 
problems, for example in the context of bioinformatics [2-6] or chemoinformatics problems [7- 
12]. 

The current trend of machine learning theory and applications is to develop meta-learning 
techniques, since no single paradigm is superior to others in all possible situations [13-18]. What 
is the exact definition of Meta-Learningl Most ML researchers do not share any single definition 
of this term, therefore it is very important to precisely define what constitutes meta-learning in 
this manuscript. Traditionally this term refers to self-adaptation of ML methods, i.e. to 
readjusting their hypothesis spaces in order to fits better to changing environment and to allow 
for higher precision classification of incoming data, or making more accurate predictions about 
previously unknown cases. Yet, this very general statement cannot be easily described in terms of 
statistical mechanics and practical applications of such methodology. Therefore, I define here the 
meta-learning as the process of combining outcomes of individual learning procedures in order to 
determine the final decision. In that way, the individual learners perform typical machine learning 
procedure, then their predictions are gathered and integrated. This definition is very similar to 
other "meta" approaches in bio- and chemo-informatics, that are now of great interest in the field 
of computational biology and biophysics. My approach is currently well supported by recent 
advances in the protein fold recognition field that is dominated by the meta predictors like 3D- 
Jury [19, 20], Peons [21, 22], Roberta [23-25]. Multiple tests confirmed that consensus methods 
were more powerful than individual prediction algorithm in sensitivity and specificity even if 
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some meta-predictors used as little as three methods to build a consensus model. Yet, the problem 
of selecting the proper and most successful procedure of meta-learning is still ongoing debate. 
Here, I provide a simple procedure for the integration of results from different methods into 
single prediction that complements previous approaches [14-18, 26]. As I mentioned above in the 
Meta-Learning (ML) one trains an ensemble of machine learning algorithms using different types 
of input training data representations [7, 27-29]. All possible solutions are gathered, and the 
consensus is build between them. The final phase of learning, i.e. consensus learning, is trying to 
balance the generality of solution and the overall performance of trained model. This approach is 
similar to other ensemble methods, yet differently from bagging (combines many unstable 
predictors to produce a ensemble stable predictor), or boosting (combines many weak but stable 
predictors to produce an ensemble strong predictor), it focuses of the use of heterogeneous set of 
algorithms in order to capture even remote, weak similarity of the predicted sample to the 
training cases. The main problem with such meta-approaches is that they are static and very 
specific. The meta-approach is optimized for certain combination of machine learning types of 
algorithms and selected particular representation of training data. Nevertheless, the model of the 
consensus should be in principle calculable in more general way. 

Therefore, the goal of this manuscript is to provide a general, theoretical framework for the 
general integration of results individual machine learning algorithms. In order to perform 
analytical analysis, I assume infinite, statistical ensemble of different ML methods. The global 
preference toward true solution can be described in my approach as the global parameter 
affecting all learners. Each learner (intelligent agent) performs training on available input data 
toward classification pressure described by the set of positive and negative cases. When the query 
testing data is analyzed each agent predicts the query item classification by "yes'V'no" decision. 
The answers of all agents are then gathered and integrated into the single prediction via 
dynamical evolution described within cellular automata framework. This dynamical view of the 
consensus between various machine learning algorithms is especially useful for artificial 
intelligence, or robotic applications, where adaptive behavior given by the integration of results 
from ensemble of ML methods. 
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Brainstorming: Cellular Automata model of Meta-Learning 

My model of learning is based on nonlocal cellular automata (C A) approach known from physics 
[30-34], with a wide range of applications [35-43]. The first statistical mechanics of opinion 
formation in groups of individuals was proposed by Lewenstein et al. [44] on the class of models 
that were based on probabilistic cellular automata and social impact theory introduced by Latane 
[45, 46]. The mean-field theory with intermittent behavior was observed with a variety of 
stationary states with a well-localized and dynamically stable clusters (domains) of individuals 
who share minority opinions [44]. The impact of a group of N agents on a given learner is 
proportional to three factors: 1) the "strength" of the members of the whole ensemble, 2) their 
"social" distance from the individual, and 3) their number N. Such model leads to ferromagnetic 
and spin-glass phases, when different values of persuasiveness and supportiveness are assumed. 
Than this approach was successfully used in a variety of sociological phenomena, with an 
interesting extension of the model was done by Kohring [47, 48], where Latane's theory was 
extended to include learning. The cellular automata with intrinsic disorder was later solved 
analytically in the continuous limit by Plewczynski [49], and proved that in the model of 
Cartesian social space (therefore not fully connected) and containing no learning rules, one can 
also observe different phases (small clusters in the sparse phase with large role of strong 
individuals, and high density phase with almost uniform opinion). The later results of Holyst et 
al., where numerical simulations and analytical models were tested in simplified geometries, 
proved the usefulness of mean-field formalism in describing the social impact theory, and the 
presence of the equilibrium states of the system with complex intermittent behavior [50-53]. 
In the present manuscript I present a novel application of cellular automata models to the meta- 
learning problem. Each cell in CA represent single machine learning algorithm, or certain 
combination of parameters, and optimization conditions affecting the classification output of this 
particular method. I call this single learner by the term "learning agent", each characterized for 
example by its prediction quality on a selected training dataset. The coupling of individual 
learners is described by short-, medium- or long-range interaction strength, so called learning 
coupling. The actual structure, or topology of coupling between various learners is described 
using term "learning space", and can have different representations, such as Cartesian space, 
fully connected or hierarchical geometry. The result of the evolution, dynamics of such system 
given by its stationary state is defined here as the consensus equilibration. The majority of 
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learners define, in the stationary limit, the "learning consensus" outcome of the meta-learning 
procedure. For example by the total difference between positive and negative predictions in the 
binary classification problem. 

The information integration, i.e. the consensus building between various machine learning 
algorithms, various prediction outcomes, is similar to the dynamical changes in cellular automata 
systems known from physics. The phase transitions can be observed in the system, the global new 
phase emerging when the system reaches a critical point in terms of its order parameter. Changes 
between phases of the system are induced by some external factors that can be modeled as a bias 
added to the local fields. Mean field theory was successfully used in a plenty of physical 
problems, such as a super-fluid effects [54], weakly interacting Bose gas in external field [55-59], 
quantum solitons in optical fibers [60, 61] and many others [62-68]. The main difference between 
my approach and those formalisms is the procedure of taking real function instead of complex 
one. The individual learners are modeled here, as single machine learning procedures, therefore 
each of them is described by set of calculable real values depending on the each ML method's 
precision, recall and classification error, therefore assuming the input training data. My approach 
is supervised learning because the description of the quality for each agent depends on the type or 
representation of the training data. On the other hand, the presented formalism is more general 
allowing for unsupervised learning, i.e. for searching for unknown patterns in input data, by 
representing them on the fully connected grid, and allowing for features similarity exchange. This 
approach is now studied by Plewczynski et al. in the context of two wide range applications in 
bioinformatics: namely the protein-protein interactions prediction and protein-ligand docking. In 
the first case, the whole known proteome of an organism is represented on the two dimensional 
grid, where each node represent single protein. The coupling between two proteins is given by the 
structural and sequence similarity focused on likely interactions sites on the surfaces of proteins 
[69]. This approach allows for unsupervised protein-protein interactions prediction, taking 
different route, namely unsupervised meta-learning based on cellular automata and phase 
transitions than previous methods [70-74]. 

The cellular automata model of meta-learning is based on several assumptions: 
1. Binary Logic 

I assume the binary logic of individual learners, i.e. we deal with cellular automata 
consisting of N agents, each holding one of two opposite states ("NO" or "YES"). These 
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states are binary cr. =+1, similarly to Ising model of ferromagnet. In most cases the 

machine learning algorithms that can model those agents, such as support vector 
machines, decision trees, trend vectors, artificial neural networks, random forest, predict 
two classes for incoming data, based on previous experience in the form of trained 
models. The prediction of an agent answers single question: is a query data contained in 
class A ("YES"), or it is different from items gathered in this class ("NO"). 

2. Disorder and random strength parameter 

Each learner is characterized by two random parameters: persuasiveness p i and 

supportiveness s i that describe how individual agent interact with others. Persuasiveness 

describes how effectively the individual state of agent is propagated to neighboring 
agents, whereas supportiveness represent self-supportiveness of single agent. In present 
work I assume that influential agents has high self-esteem p i = s t , what is supported by 
the fact that highly effective learners should have high impact on others in meta-learning 
procedure. For example, we can select p i = f (precision, i) and s t = f (recall, i) in the 

case where agents are modeled as single machine learning procedures. In general the 
individual differences between agents are described as random variables with a 

probability density p = (p i ,s i ), with mean values p = an d s = • Similarly to 

the social influence theory, the quality of predictor in some way affect its influence 

strength, when the final optimization of meta-learning consensus is done. 

In the case of meta-learning procedure the persuasiveness p, represents here the ability 

of learning agent j to persuade agents who hold the opposite state to switch to having the 
same state as j. The supportiveness Sj represents the ability of learning agent j to support 

agents who hold the same state, so not only the self-support of an individual agent (to 
itself), but the support that an agent gives to other agents who share the same state as it 
has. 

3. Learning space and learning metric 

Each agent is characterized by a location in the learning space, therefore one can calculate 
the abstract learning distance d (i, j) of two learners i and j. The strength of coupling 
between two agents tend to decrease with the learning distance between them. 
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Determination of the learning metric is a separate problem, and the particular form of the 
metric and the learning distance function should be empirically determined, and in 
principle can be a very peculiar geometry. In present manuscript, I select the fully 

connected learning space, where all distances between agents are equal d{i,j~) = \. This 

particular geometry is useful for example in the case of simple consensus between 
different yet not organized machine learning algorithms, where no group of learners 
perform significantly better than others. 
4. Learning coupling 

Agents exchange their opinions by biasing others toward their own classification 
outcome. This influence can be described by the total learning impact L that z'th agent is 

experiencing from all other learners. Within the cellular automata approach this impact is 
the difference between positive coupling of those agents that hold identical classification 
outcome, relative to negative influence of those who share opposite state, and can be 
formalized as 



V i iV J U iV 



(1) 



where I (.) and I s (.) are the functions of persuasiveness and supportiveness impact of 

the other agents on the z'-th agent. It should be noted here that the persuasiveness » . 

represents here the ability of agent j to persuade agents who hold the opposite state to 
switch to having the same state as j. On the contrary the supportiveness s } represents the 

ability of agent j to support agents who hold the same state, i.e. preventing them from 
switching to the opposite state. That is, persuasiveness represents the propensity of j to 
cause other agents to switch to her state, and supportiveness represents her propensity to 
keep them there. 
5. The equations of meta-learning 

The equation of dynamics of the learning model defines the state a\ of z'th individual at 
the next time step as follows: 

cr] =(-sign(a i I i )), (2) 

with rescaled learning influence: 
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I assume a synchronous dynamics, i.e. states of all agents are updated in parallel. In 
comparison to standard Monte Carlo methods the synchronous dynamics takes shorter 
time to equilibrate than serial methods, yet it can be trapped into periodic asymptotic 
states with oscillations between neighboring agents. 
6. Presence of noise 

The randomness of state change (phenomenological modeling of various random 
elements in the learning system, and training data) is given by introducing noise into 
dynamics: 



where h t is the site-dependent white noise, or one can select a uniform white noise, where 
for all agents k = h . In the first case h t are random variables independent for different 
agents and time instants, whereas in the second case h are independent for different time 
instants. I assume here, that the probability distribution of h t is both site and time 

independent, i.e. it has uniform statistical properties. The uniform white noise simulates 
the global bias affecting all agents, whereas site-dependent white noise describes local 
effects, such as prediction quality of individual learner etc. 
The system defined in this way is similar to previously postulated cellular automata models of 
opinion change in social sciences [44, 49, 50]. The main differences of those approaches from the 
previously described cellular automata models is given by the infinite-range interactions and fully 
connected cellular automata, that are better fitted to the learning context of the problem. In 
addition, the random strength parameters are introduced, therefore allowing for more complex 

behavior to be observed. Individual agents are described using probability density p = (p i ,s i ), so 

they differs from each other. The impact function is also included, so learners are able to 
exchange their states in the form of coupling. 

Mean-Field approximation 

The fully connected learning space geometry present an interesting practical formalizations for 
further analysis of meta-learning procedure. Here, all agents are coupled with each other with 




(4) 
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some randomly distributed strength that is independent on the distance between them. The mean- 
field theory provides very well defined and controlled approximation allowing for solving the 
dynamical equations of such model. The dynamical "order" parameter has to be defined, to show 
the decay of minority groups in the form of "staircase" dynamics. The fully connected geometry 
of learning space is supported by topology of this problem. We have a set of independent 
machine learning algorithms, no prior hierarchy or topology for this set is imposed, and we would 
like to build the consensus between their predictions. 
The discrete equation of dynamic is given by: 



cr = -sign 



(5) 



Introducing a weighted majority-minority difference for a system: 



m 



j N(s + p) 



and random parameters to describe effective self-supportiveness of each agent: 

a i =—^ + -^s i , (7) 
s + p s + p 

we get the dynamical equation in noise absent limit by rewriting the Eq. 5 using 6>(.) as 
Heaviside theta function: 

cr. = sign (m + h t ) &(jm + h t | — |a ; . |) + a t sign (a i ) | — |m + h t | J . (8) 

We assume here that a i > for any distribution of random variable s ; [44]. 

The order parameter is defined in physics as a quantity that defines the phase transition between 
various phases in the physical system, for example defining the evolution of the ordered system 
toward chaotic behavior. In this model, the order parameter describes the changes between 
various meta-learning solutions (agreement between learners, uniformity of opinion, minority 
clusters, chaotic state of not coupled learners) and it is given by the formula: 

*M-Sfc^'(«,-*). w 

as in the standard mean-field theory of Ising systems [44, 47, 62]. The meaning of this parameter 
is as follows: n is a positive real number, and it is equal to the weighted majority-minority 
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difference calculated for those agents that have effective self-supportiveness a y greater that cp. 

The order parameter defines different, stationary states of the dynamics, in the noiseless limit it 
determines uniquely the approach toward a stationary state, and with small noise the only 
stationary states are close to uniformity with m = ±1 . From the practical point of view it is 
enough to describe the ordering among agents of a given strength in order to completely specify 
the state of the system. 

djricp) 

The derivative is related to the weighted majority-minority difference for agents with 

dq> 

effective self-supportiveness a j is equal to cp, i.e. for those agents with 
Sj =-^\ ( p{ s + p)~( s ~ p\\ • As in the work of Lewenstein [44] the order parameter n i in the 

noiseless limit fulfills the equation: 

7i {(p) = |^//(m,(s») + ^"(|w|^6'(|w|-^) + ^"(^)6'(^-|w|), (10) 

with ^"(O) = m , and 

M{m,cp)= y N ^wz%^^^(H-kl)^(«,-^)- (11) 

The mean-field approximation for the system is introduced by replacing the actual variables (like 
m , 7r{q>\, ju(m,(p) by their corresponding mean values calculated by averaging over disorder, 

i.e. random distribution of self-supportiveness s j and p t . The averaged equations are then valid 
for very large TV (preferably infinite system), where m is no longer random variable, because its 
fluctuations are of order of J/jjy ■ The recurrence equation for m is defined for cp = , i.e. in 

regions where <p > \m\ — : 

m =//(m,0) + 7r(|m|) . (12) 
Reformulating the definition of n{<p) gives: 

*W-"-2%^ Lfl ("-' , >)- (13) 

Therefore the recurrence equation for m has the following form [44]: 
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m = m + z ^C(s+p) i sign ( m ) - a J ■ ] d (p - a J ) • (i4) 

Let's define the function / (m) as the value of m in the consecutive time steps: 

/ (m) = m + ^ ^ ^ ^ [«gf ( w) - C7 ft = 0)] fl(|m| - a, ) . (15) 

Therefore, we have the mean-field behavior of the recurrence equation described by the general 
equation: 

/ (m ) = m + X ^ ~ ^ y [«g* H ~ (* = °)] * ~ a j ) ■ ( 16 > 
j N[s + p) L 

In this meta-learning model each agent is described by two random parameters: persuasiveness 
p t (the coupling with other agents) and supportiveness s t (self-supportiveness). The first 

parameter, persuasiveness, imposes how effectively the individual state of agent is propagated to 
neighboring agents, whereas the second parameter, supportiveness, represents self-influence of a 
single agent. We can select for each type of machine learning algorithm, or instance of single ML 

method (depending for example on the methods' parameters values) a p i = / (precision, i) and 
s i = f [recall, i) . In present manuscript I assume that influential agents has high self-esteem: 

p i =j. = h (precision, recall, /)□ ^[precision[i] + recall[i]^ . (17) 

Here, highly effective learners should have high impact on others in meta-learning procedure. 
The individual differences between agents are described as random variables with a probability 

density p = (p i ,s i ), with mean values p = an d 5 = Z~~> therefore the recurrence 

equation (16) has a simplified form: 

f(m) = m +^^- [sign(m) - cr j (t = 0)] [<p - a . ) , (18) 

with 



j' Ns [ 



a, = —s i 
1 2s 1 
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The procedure of averaging is done here over different random distributions of initial conditions 
<j i (t = 0) , and different possible distributions of values of self-supportiveness s t . 
The value of m changes during the evolution of the system: 

m=y - J — L , 

and 

The recurrence equation for m is given by the formula [44]: 

m = m + ^ — [sign (m) - cr y (t = 0)1 0(jm\ — a. ) . 

And for positive initial value of m we have: 

m =//(m,0) + ^ (m). 

Reformulating the definition of n{<p) gives: 

n (<p) = m-Y 4 SjCTj ^ " > e{<p-a,), (19) 
therefore the recurrence map for m has the following form [44]: 

f(m) = m o+ ^[l-a j (t = O)]0(m-a j ). (20) 

The above equations show that / (m) is bounded and increasing function of m . The m is 

bounded and increasing function of time step, therefore it has at least one stable fixed point. It 
may have also several fixed stable points separated by unstable ones, fulfilling the equation: 

During time evolution m tend to the nearest stable fixed point from its initial value m , therefore 

if noise is not present the meta-learning gives the final answer as the local minima in the space of 
possible solutions. Therefore clusters of minorities appear in the system as generic solutions in 
the noiseless limit. When the system is forced to make a consensus prediction for example by 
majority rule, the final prediction outcome is given simply by the majority result, yet the actual 
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probability of the correct answer can be approximated as the percentage of states in agreement 
with majority rule, in comparison to the number of minority groups. 

For large enough m the system is reaches the stationary state close to the uniformity state. For 
small m the system is unstable, and grows quadraticaly in m . The similar description is valid for 
negative values of m , therefore f(m) has two stable and one unstable fixed point [44]. In the 
field theoretical formulation the system has a set of local minima describing clusters of minority 
states characterized by different s . . Those groups collapse successively with stronger s . when a 

small noise is added to the system [44]. I have to point out, that above model is valid also for 
different initial conditions, where weaker agents have random state in opposition to uniform state 
of stronger ones. If the equilibrium state of the system exists, and it is stable, the system will be 
also stable for cases, where initial state for weaker agents will be more diverse (non-uniform). 
The above solutions for the system are given by the minority clusters surrounded by the majority 
agents, and the dynamic is of "staircase" character in the presence of small noise [44, 49]. In 
general three initial states of the system can observed: sparse (no correlation between the agent 
strength and its state), middle density (a state of a agent starts to be correlated with its strength, 
lot of interesting meta-stable global configurations), and large density state (most of agents 
initially have similar state, therefore the role of coupling is not so important). In the first case 
clusters of both types of states may appear, and when the weak coupling is present there is no 
bias toward uniform solution. In the second case a variety of sophisticated geometries, shapes of 
clusters are present, some are robust and meta-stable, other disappearing slowly changing their 
state in agreement with majority rule. Here, no analytical solutions are easy to find, therefore 
computer simulations have to be applied. I leave this case to my next manuscript, where 
extensive computer modeling of the system will be presented. 

Concluding remarks 

Intelligent agents theory is a fascinating topic in modern science [75-79]. Decision making 
transitions depend to high degree on global factors influencing an ensemble of independent 
learners. On the other hand, those changes are dependent to a high degree on individual decisions 
(predictions) that are based on agents' attititudes. During consensus, i.e. the final decision 
making, the reciprocal influence is critical as each learner exchange its opinion with others. In my 
approach, I assume that external factors acting on each learner are present during only the first 
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phase of me ta- learning, where initial states for a population of learners are setting up. Yet, both 
processes even if acting on different time scales, are important for understanding the 
computational intelligence process. 

In this manuscript I have presented the statistical theory of meta-learning. In my approach I select 
long-range coupling between agents, as opposite for example to the Euclidean two dimensional 
learning space, where only nearest-neighbors are coupled. This assumption is well supported by 
the fact that we are typically focused on only equilibrium, stationary states. The fully connected 
learning space lets agents evolve faster in comparison to other types of cellular automata. In 
addition, all agents influence each other, therefore we avoid local minima traps for the global 
system. 

Each learner is characterized by two random parameters: persuasiveness pt and supportiveness Si 
that describe how individual agent interact with others. The random strength parameters simulate 
different individual features of learning agents. In principle one can define both parameters in 
various different ways. In the case of a set of machine learning algorithms, each of them can be 
described by its intrinsic parameters affecting precision of single classification model of training 
data. In general case, several different types of machine learning algorithms can be used as 
individual learners. There, the distribution of quality of local prediction can be described as 
random providing that algorithms differ significantly between each other in terms both of the 
quality of prediction (classification accuracy), recall values (the ability to memorize the positive 
items in the training dataset), or precision (the ability to precisely predict the classification of 
training items). 

The other definition of those parameters (persuasiveness and supportiveness) can enhance the 
method persuasiveness (the value of p t ), if the method has the state <7 i =+1, and make its p t 

value lower when the opposite state is taken. In this way, it allows to speed up the consensus 
process by forcing system to reach equilibrium state more rapidly, yet pushing it to the +1 
decision based on the selected training dataset. This can cause several problems with 
overtraining, therefore some limitations of this approach should be taken into account. The 
actual solutions presented in this paper, yet do not depend strongly on the selected form of those 
parameters. Anyway we assume that they are some random variables describing the variety of 
individual decisions in the ensemble of learners. 
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There two time scales in the system. The first time scale is related to the fast evolution of 
individual learners. When input testing data is presented to the system, each learner respond by 
its own single prediction. This local prediction of each agent is done very rapidly, almost 
instantly. Then those individual predictions are processed by cellular automata algorithm in order 
to find the stationary state of the system. This part is denoted as integration of information. As it 
was shown above, such stationary state has the form of minority clusters surrounded by the sea of 
majority prediction. Therefore, the final consensus prediction given by the majority rule, still 
preserves non-orthodox solutions, allowing for fast adaptivity of the system when training data 
pattern is changed. The time scale for this integrative process is relatively long in comparison to 
individual predictions, therefore very fast (preferably optimized for parallel processing) cellular 
automata software implementations have to be prepared in order to apply described above 
formalism in real life problems. In the statistical model presented here, I assume that there is no 
coupling between those two time scales. Therefore I neglect all details of individual evolution of 
learners, focusing our attention for integration phase of incoming local information into single, 
consensus answer. 

The core question of this manuscript is how typical initial distribution of learners' state evolve in 
time? As it was shown above different initial conditions distinguish by the numbers of agents 

sharing opposite opinion m - V C Tj y X J into three classes. The first class is close to uniform state 




w = 1 , where almost all learners initially are in agreement. The evolution of the system rapidly 

collapses into stationary uniform state. This situation is observed when individual learners share 
similar machine learning algorithms, or wide spectra of parameters values do not change the 
classification model. Opposite states are sparse, and randomly spread over the learning space. For 
example, most of single ML algorithms (such as Random Forest, SVM) trained on an easy or 
with moderate difficulty training dataset will give very similar prediction for a test cases. 
Therefore the initial state of the consensus system is close to uniformity of opinion, and the 
uniformity state is the most frequent final state. The second initial condition OD \m\ < 1 describes 

much richer solutions space, moderate number of agents share opposite state and those can be 
distributed randomly over the learning space, or clustered into well defined groups. This describe 
the situation, where different values of parameters can cause different classification outcomes, or 
ensemble of ML algorithms contain significantly different between each other algorithms, that 




j 
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construct distinct classification models of input training data. The system has its intrinsic 
preferences (or in other words preferable local classification model) - most of agents agree with 
their preferences, yet to some degree the opposite consensus state is possible. Therefore, one can 
assume that agreement between agents is possible, even if there is significant proportion of 

learners that classify input data oppositely. The third type of initial conditions |m| = contain 

distributed randomly or clustered different agents' states spread over the learning space. Because 
the number of opposite states is similar, therefore the system is on the edge of phase transition 
between two final consensus answers: "YES" or "NO". Therefore, even small perturbation of 
initial state, parameters change or type and nature of testing examples can in principle guide the 
system into different, opposite answers. This type of consensus is more fitted to the difficult 
training cases, where both answers are very probable. Here, the final trained system is very 
fragile and strongly depend on testing input data. The small change of input testing data can build 
up the very different consensus value. Here, the consensus as the final, stable state of the whole 
system is not obvious, and it can take a significant amount of time. The final state can be either 
randomly distributed negative learners in the majority of positive states, or clustered minorities. 
Here, I apply the analytical results of mean-field approximation for cellular automata dynamics 
with moderate- and long-range interactions known previously from findings of Lewenstein et al. 
[44], Plewczynski [49] and Holyst et al. [50, 51, 53] to novel problem of meta-learning. The CA 
model is characterized by long-range interactions between individual agents, the fully connected 
learning space (each individual learners exchange their prediction results with all others) and by 
an intrinsic disorder that allow for complex learning geometries to appear in the system. Two 
obvious emergent phenomena were observed immediately in this class of models: polarization 
and clustering [44]. In this article I prefer to use different terms for description of those emerging 
phenomena, namely integration and adaptivity. The integration is based on polarization effect 
(single majority as the stable stationary state of the system, therefore allowing for consensus 
prediction), whereas the adaptivity is given by the clustering of similar outputs of groups of 
learners (small minority clusters of non-preferred learning outcome are grouped together in the 
learning space). 

The order parameter of the system is given by the variable rj that characterizes geometrical and 
dynamical features of the model. The equilibration of the system in clustered state is given by 
intermittent, consecutive steps in the form of "stair-case" dynamics [44, 49]. First the strongest 
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agents change their state, then the weaker rest of minority cluster collapses. The final solution is 
given in most cases as single state with uniform prediction outcome. Therefore is defines the final 
answer of the consensus system, when input data is presented to the learning network of 
individual agents. The existence of this uniform solution in the limit of infinite time, when noise 
is present, is of crucial importance for further analysis of this meta-learning model. The real life 
realizations of these algorithm will be presented in forthcoming manuscripts, especially in the 
field of bio- and chemo -informatics. 

Acknowledgements: 

This work was supported by the Polish Ministry of Education and Science (N301 159735) and 
other financial sources. I would like to thank Prof. M. Lewenstein (ICREA & ICFO, Barcelona, 
Spain) and Prof. M. Niezgodka (ICM, University of Warsaw, Warsaw, Poland) for stimulating 
discussions. Author would like to thank anonymous reviewer for fruitful comments and 
suggestions that strongly enhanced the results of this work. 



17 



References 



1 . Wikipedia. Machine Learning. 2009 [cited; Available from: 
http ://en.wikipedia. org/wiki/Machine_Learning . 

2. Bacardit, J., et al., Automated alphabet reduction for protein datasets. BMC 
Bioinformatics, 2009. 10: p. 6. 

3. Plewczynski, D., et al., AutoMotif server: prediction of single residue post-trans lational 
modifications in proteins. Bioinformatics, 2005. 21(10): p. 2525-7. 

4. Bhaskar, H., D.C. Hoyle, and S. Singh, Machine learning in bioinformatics: a brief 
survey and recommendations for practitioners. Comput Biol Med, 2006. 36(10): p. 1 104- 
25. 

5. Larranaga, P., et al., Machine learning in bioinformatics. Brief Bioinform, 2006. 7(1): p. 
86-112. 

6. Mjolsness, E. and D. DeCoste, Machine learning for science: state of the art and future 
prospects. Science, 2001. 293(5537): p. 2051-5. 

7. Plewczynski, D., S.A. Spieser, and U. Koch, Assessing different classification methods for 
virtual screening. J Chem Inf Model, 2006. 46(3): p. 1098-106. 

8. Azencott, C.A., et al., One- to four-dimensional kernels for virtual screening and the 
prediction of physical, chemical, and biological properties. J Chem Inf Model, 2007. 
47(3): p. 965-74. 

9. Willett, P., et al, Prediction of ion channel activity using binary kernel discrimination. J 
Chem Inf Model, 2007. 47(5): p. 1961-6. 

10. Byvatov, E. and G. Schneider, SVM-based feature selection for characterization of 
focused compound collections. J Chem Inf Comput Sci, 2004. 44(3): p. 993-9. 

1 1 . Karwath, A. and L. De Raedt, SMIREP: predicting chemical activity from SMILES. J 
Chem Inf Model, 2006. 46(6): p. 2432-44. 

12. Melville, J.L., E.K. Burke, and J.D. Hirst, Machine learning in virtual screening. Comb 
Chem High Throughput Screen, 2009. 12(4): p. 332-43. 

13. Engelbrecht, A.P., Computational Intelligence. 2007: John Wiley & Sons Ltd. 

14. Burton, J., et al., Virtual screening for cytochromes p450: successes of machine learning 
filters. Comb Chem High Throughput Screen, 2009. 12(4): p. 369-82. 

15. Do, C.B., C.S. Foo, and S. Batzoglou, A max-margin model for efficient simultaneous 
alignment and folding of RNA sequences. Bioinformatics, 2008. 24(13): p. i68-76. 

16. Gesell, T. and S. Washietl, Dinucleotide controlled null models for comparative RNA 
gene prediction. BMC Bioinformatics, 2008. 9: p. 248. 

17. Khandelwal, A., et al., Computational models to assign biopharmaceutics drug 
disposition classification from molecular structure. Pharm Res, 2007. 24(12): p. 2249-62. 

18. Ying, H., et al., A fuzzy discrete event system approach to determining optimal HIV/AIDS 
treatment regimens. IEEE Trans Inf Technol Biomed, 2006. 10(4): p. 663-76. 

19. Ginalski, K., et al., SD-Jury: a simple approach to improve protein structure predictions. 
Bioinformatics, 2003. 19(8): p. 1015-8. 

20. von Grotthuss, M., et al., Application of SD-Jury, GRDB, and Verify3D in fold 
recognition. Proteins, 2003. 53 Suppl 6: p. 418-23. 

21. Bujnicki, J.M., et al, Structure prediction meta server. Bioinformatics, 2001. 17(8): p. 
750-1. 

22. Lundstrom, J., et al., Peons: a neural-network-based consensus predictor that improves 
fold recognition. Protein Sci, 2001. 10(11): p. 2354-62. 



18 



23. Chivian, D., et al., Automated prediction of CASP-5 structures using the Robetta server. 
Proteins, 2003. 53 Suppl 6: p. 524-33. 

24. Chivian, D., et al., Prediction of CASP6 structures using automated Robetta protocols. 
Proteins, 2005. 61 Suppl 7: p. 157-66. 

25. Kim, D.E., D. Chivian, and D. Baker, Protein structure prediction and analysis using the 
Robetta server. Nucleic Acids Res, 2004. 32(Web Server issue): p. W526-31. 

26. Capobianco, E., Model validation for gene selection and regulation maps. Funct Integr 
Genomics, 2008. 8(2): p. 87-99. 

27. Hotz, C.S., S.J. Templeton, and M.M. Christopher, Comparative analysis of expert and 
machine-learning methods for classification of body cavity effusions in companion 
animals. J Vet Diagn Invest, 2005. 17(2): p. 158-64. 

28. Klon, A.E., M. Glick, and J.W. Davies, Combination of a naive Bayes classifier with 
consensus scoring improves enrichment of high-throughput docking results. J Med Chem, 
2004. 47(18): p. 4356-9. 

29. Plewczynski, D., Brainstorming: Consensus Learning in Practice. Frontiers in 
Neuroinformatics, 2009. 

30. Goldschmidt, Y.Y., et al., Nonequilibrium critical behavior in unidirectionally coupled 
stochastic processes. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 
1999. 59(6): p. 6381-408. 

3 1 . Hauert, C, Fundamental clusters in spatial 2x2 games. Proc Biol Sci, 2001 . 268(1468): p. 
761-9. 

32. Dolezal, J. and T. Hraba, Application of mathematical model of immunological tolerance 
to HIV infection. Folia Biol (Praha), 1988. 34(5): p. 336-41. 

33. Gerhardt, M., H. Schuster, and J.J. Tyson, A cellular automation model of excitable media 
including curvature and dispersion. Science, 1990. 247(4950): p. 1563-6. 

34. Ito, K. and Y.P. Gunji, Self-organization toward criticality in the Game of Life. 
Biosystems, 1992. 26(3): p. 135-8. 

35. Bartocci, E., et al., CellExcite: an efficient simulation environment for excitable cells. 
BMC Bioinformatics, 2008. 9 Suppl 2: p. S3. 

36. Chauvet, G.A., On the mathematical integration of the nervous tissue based on the S- 
propagator formalism. J Integr Neurosci, 2002. 1(1): p. 31-68. 

37. Kanai, M., et al., Ultradiscrete optimal velocity model: A cellular-automaton model for 
traffic flow and linear instability of high-flux traffic. Phys Rev E Stat Nonlin Soft Matter 
Phys, 2009. 79(5 Pt 2): p. 056108. 

38. Su, M., et al., Spatiotemporal dynamics of the epidemic transmission in a predator-prey 
system. Bull Math Biol, 2008. 70(8): p. 2195-210. 

39. Tyson, J.J. and M.C. Mackey, Molecular, metabolic, and genetic control: An 
introduction. Chaos, 2001. 11(1): p. 81-83. 

40. Conrad, M., et al, Towards an artificial brain. Biosystems, 1989. 23(2-3): p. 175-215; 
discussion 216-8. 

41 . Grim, J. and J. Hora, Iterative principles of recognition in probabilistic neural networks. 
Neural Netw, 2008. 21(6): p. 838-46. 

42. Kawato, M., From 'understanding the brain by creating the brain' towards manipulative 
neuroscience. Philos Trans R Soc Lond B Biol Sci, 2008. 363(1500): p. 2201-14. 

43. Yu, D., F. Liu, and P.Y. Lai, Input reconstruction of chaos sensors. Chaos, 2008. 18(2): 
p. 023106. 



19 



44. Lewenstein, M., A. Nowak, and B. Latane, Statistical mechanics of social impact. Phys 
Rev A, 1992. 45(2): p. 763-776. 

45. Latane, B., Am. Psychol, 1981(36): p. 343. 

46. Nowak, A., J. Szamrej, and B. Latane, Psychol. Rev., 1990(97): p. 362. 

47. Kohring, G.A., Ising models of social impact: The role of cumulative advantage. Journal 
De Physique I, 1996. 6(2): p. 301-308. 

48. Kohring, G.A., J. Phys. I France, 1996(6): p. 301-308. 

49. Plewczynski, D., Landau theory of social clustering. Physica A, 1998. 261(3-4): p. 608- 
617. 

50. Holyst, J.A., K. Kacperski, and F. Schweitzer, Phase transitions in social impact models 
of opinion formation. Physica A, 2000. 285(1-2): p. 199-210. 

5 1 . Kacperski, K. and J. A. Holyst, Phase transitions and hysteresis in a cellular automata- 
based model of opinion formation. Journal of Statistical Physics, 1996. 84(1-2): p. 169- 
189. 

52. Kacperski, K. and JA. Holyst, Opinion formation model with strong leader and external 
impact: a mean field approach. Physica A, 1999. 269(2-4): p. 511-526. 

53. Kacperski, K. and J. A. Holyst, Phase transitions as a persistent feature of groups with 
leaders in models of opinion formation. Physica A, 2000. 287(3-4): p. 631-643. 

54. Kuklov, A., N. Prokof ev, and B. Svistunov, Commensurate two-component bosons in an 
optical lattice: ground state phase diagram. Phys Rev Lett, 2004. 92(5): p. 050402. 

55. Tanaka, F., Thermor ever sib le gelation is a Bose-Einstein condensation. Phys Rev E Stat 
Nonlin Soft Matter Phys, 2006. 73(6 Pt 1): p. 061405. 

56. Bronski, J.C., et al., Stability of attractive Bose-Einstein condensates in a periodic 
potential. Phys Rev E Stat Nonlin Soft Matter Phys, 2001. 64(5 Pt 2): p. 056615. 

57. Hansson, T.H., J.M. Leinaas, and S. Viefers, Exclusion statistics in a trapped two- 
dimensional Bose gas. Phys Rev Lett, 2001. 86(14): p. 2930-3. 

58. Hodby, E., et al., Experimental observation of Beliaev coupling in a Bose-Einstein 
condensate. Phys Rev Lett, 2001. 86(11): p. 2196-9. 

59. Jackson, A.D., et al., Weakly interacting Bose-Einstein condensates under rotation: 
mean-field versus exact solutions. Phys Rev Lett, 2001. 86(6): p. 945-9. 

60. Staliunas, K, Midband solitons in nonlinear photonic crystal resonators. Phys Rev E Stat 
Nonlin Soft Matter Phys, 2004. 70(1 Pt 2): p. 016602. 

61 . Ponomarenko, S.A., N.M. Litchinitser, and G.P. Agrawal, Theory of incoherent optical 
solitons: beyond the mean-field approximation. Phys Rev E Stat Nonlin Soft Matter Phys, 
2004. 70(1 Pt 2): p. 015603. 

62. Uzunov, D.I., Derivation of effective field theories. Phys Rev E Stat Nonlin Soft Matter 
Phys, 2008. 78(4 Pt 1): p. 041 122. 

63. Ciach, A., W.T. Gozdz, and G. Stell, Field theory for size- and charge-asymmetric 
primitive model of ionic systems: mean-field stability analysis and pretransitional effects. 
Phys Rev E Stat Nonlin Soft Matter Phys, 2007. 75(5 Pt 1): p. 05 1505. 

64. Range, G.M. and S.H. Klapp, Density functional study of the phase behavior of 
asymmetric binary dipolar mixtures. Phys Rev E Stat Nonlin Soft Matter Phys, 2004. 
69(4 Pt 1): p. 041201. 

65. Buceta, J. and K. Lindenberg, Comprehensive study of phase transitions in relaxational 
systems with field-dependent coefficients. Phys Rev E Stat Nonlin Soft Matter Phys, 2004. 
69(1 Pt 1): p. 011102. 



20 



66. Chitanvis, S.M., Theory of polyelectrolytes in solvents. Phys Rev E Stat Nonlin Soft 
Matter Phys, 2003. 68(6 Pt 1): p. 061802. 

67. Pastor-Satorras, R. and R.V. Sole, Field theory for a reaction-diffusion model of 
quasispecies dynamics. Phys Rev E Stat Nonlin Soft Matter Phys, 2001. 64(5 Pt 1): p. 
051909. 

68. Drossel, B., H. Bokil, and M.A. Moore, Spin glasses without time-reversal symmetry and 
the absence of a genuine structural glass transition. Phys Rev E Stat Phys Plasmas Fluids 
Relat Interdiscip Topics, 2000. 62(6 Pt A): p. 7690-9. 

69. Plewczynski, D., unpublished results. 2009. 

70. Craig, R.A. and L. Liao, Phylogenetic tree information aids supervised learning for 
predicting protein-protein interaction based on distance matrices. BMC Bioinformatics, 
2007. 8: p. 6. 

71 . Del Carpio-Munoz, C.A., et al., MIAX: a new paradigm for modeling biomacromolecular 
interactions and complex formation in condensed phases. Proteins, 2002. 48(4): p. 696- 
732. 

72. Krause, R., C. von Mering, and P. Bork, A comprehensive set of protein complexes in 
yeast: mining large scale protein-protein interaction screens. Bioinformatics, 2003. 
19(15): p. 1901-8. 

73. Mamitsuka, H., Essential latent knowledge for protein-protein interactions: analysis by 
an unsupervised learning approach. IEEE/ ACM Trans Comput Biol Bioinform, 2005. 
2(2): p. 119-30. 

74. Tsai, R.T., et al., Exploiting likely-positive and unlabeled data to improve the 
identification of protein-protein interaction articles. BMC Bioinformatics, 2008. 9 Suppl 
l:p. S3. 

75. Conte, R., Agent-based modeling for understanding social intelligence. Proc Natl Acad 
Sci USA, 2002. 99 Suppl 3: p. 7189-90. 

76. Liu, J., W. Zhong, and L. Jiao, A multiagent evolutionary algorithm for constraint 
satisfaction problems. IEEE Trans Syst Man Cybern B Cybern, 2006. 36(1): p. 54-73. 

77. Pedrycz, W. and P. Rai, A multifaceted perspective at data analysis: a study in 
collaborative intelligent agents. IEEE Trans Syst Man Cybern B Cybern, 2008. 38(4): p. 
1062-72. 

78. Pedrycz, W. and P. Rai, A multifaceted perspective at data analysis: a study in 
collaborative intelligent agents. IEEE Trans Syst Man Cybern B Cybern, 2009. 39(4): p. 
834-44. 

79. Rocha, A.F., The brain as a symbol-processing machine. Prog Neurobiol, 1997. 53(2): p. 
121-98. 



21 



